Installation note: You may need to install Cairo on your operating system to run this notebook. See README for details.
if(!require(Cairo)) install.packages("Cairo", repos = "http://cran.us.r-project.org")
## Loading required package: Cairo
The purpose of this project is to predict the price of houses in California in 1990 based on a number of possible location-based predictors, including latitude, longitude, and information about houses within a particular block.
While this project focuses on prediction we are fully aware and want you the reader to also be aware that housing prices increased incredibly after this time period, then the bubble burst for a while and housing prices increased again. This model should not be used to predict the actual future. This is a purely academic endeavor to explore statistical prediction.
The goal of the project is to create the model that can best predict home prices in California given reasonable test/train splits in the data.
We’re using the California Housing Prices dataset from the following Kaggle site: https://www.kaggle.com/camnugent/california-housing-prices. This data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data.
We loaded housing.csv into R.
library(readr)
library(knitr)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
housing_data = read_csv("housing.csv")
## Parsed with column specification:
## cols(
## longitude = col_double(),
## latitude = col_double(),
## housing_median_age = col_double(),
## total_rooms = col_double(),
## total_bedrooms = col_double(),
## population = col_double(),
## households = col_double(),
## median_income = col_double(),
## median_house_value = col_double(),
## ocean_proximity = col_character()
## )
housing_data$median_house_value[1:100]
## [1] 452600 358500 352100 341300 342200 269700 299200 241400 226700 261100
## [11] 281500 241800 213500 191300 159200 140000 152500 155500 158700 162900
## [21] 147500 159800 113900 99700 132600 107500 93800 105500 108900 132000
## [31] 122300 115200 110400 104900 109700 97200 104500 103900 191400 176000
## [41] 155400 150000 118800 188800 184400 182300 142500 137500 187500 112500
## [51] 171900 93800 97500 104200 87500 83100 87500 85300 80300 60000
## [61] 75700 75000 86100 76100 73500 78400 84400 81300 85000 129200
## [71] 82500 95200 75000 67500 137500 177500 102100 108300 112500 131300
## [81] 162500 112500 112500 137500 118800 98200 118800 162500 137500 500001
## [91] 162500 137500 162500 187500 179200 130000 183800 125000 170000 193100
The dataset contains 20640 observations and 10 attributes (9 predictors and 1 response). Below is a list of the variables with descriptions taken from the original Kaggle site given above.
This dataset meets all of the stated criteria for the project including:
median_house_valueoceanProximityLet’s look at a summary of each column.
summary(housing_data)#gives us a summary of each column. Note that total bedrooms has 207 NA's. We will need to impute these values
## longitude latitude housing_median_age total_rooms
## Min. :-124.3 Min. :32.54 Min. : 1.00 Min. : 2
## 1st Qu.:-121.8 1st Qu.:33.93 1st Qu.:18.00 1st Qu.: 1448
## Median :-118.5 Median :34.26 Median :29.00 Median : 2127
## Mean :-119.6 Mean :35.63 Mean :28.64 Mean : 2636
## 3rd Qu.:-118.0 3rd Qu.:37.71 3rd Qu.:37.00 3rd Qu.: 3148
## Max. :-114.3 Max. :41.95 Max. :52.00 Max. :39320
##
## total_bedrooms population households median_income
## Min. : 1.0 Min. : 3 Min. : 1.0 Min. : 0.4999
## 1st Qu.: 296.0 1st Qu.: 787 1st Qu.: 280.0 1st Qu.: 2.5634
## Median : 435.0 Median : 1166 Median : 409.0 Median : 3.5348
## Mean : 537.9 Mean : 1425 Mean : 499.5 Mean : 3.8707
## 3rd Qu.: 647.0 3rd Qu.: 1725 3rd Qu.: 605.0 3rd Qu.: 4.7432
## Max. :6445.0 Max. :35682 Max. :6082.0 Max. :15.0001
## NA's :207
## median_house_value ocean_proximity
## Min. : 14999 Length:20640
## 1st Qu.:119600 Class :character
## Median :179700 Mode :character
## Mean :206856
## 3rd Qu.:264725
## Max. :500001
##
Initial exploration of the data showed us that there were a few steps we needed to take to make the data more useable. Firstly, we changed the categorical variable oceanProximity from text-based to a factor variable.
housing_data$ocean_proximity = as.factor(housing_data$ocean_proximity)
ocean_proximity = housing_data$ocean_proximity
We see that the factor variable oceanProximity has the following \(5\) levels: \(<1H OCEAN, INLAND, ISLAND, NEAR BAY, NEAR OCEAN\).
The other thing to consider is missing data.
sum(is.na(housing_data))
## [1] 207
total_bedrooms = housing_data$total_bedrooms
sum(is.na(total_bedrooms))
## [1] 207
There are \(207\) observations with missing data for total_bedrooms. We’ll need to figure out how to handle this missing data. However, looking at the relationship between total_bedrooms and total_rooms, it looks possible that this is collinearity and we won’t gain any information by using total_bedrooms variable in our model. Further testing is required before we can make this decision.
plot(housing_data$total_bedrooms ~ housing_data$total_rooms, col="dodgerblue")
Other possible things we could do is to fill in the missing total_bedrooms data with the median value of total_bedrooms grouped by total_rooms, since there is a relationship.
library(tidyverse)
housing_data %>%
group_by(total_rooms) %>%
summarize(median.total_bedrooms = median(total_bedrooms, na.rm = TRUE))
## # A tibble: 5,926 x 2
## total_rooms median.total_bedrooms
## <dbl> <dbl>
## 1 2 2
## 2 6 2
## 3 8 1
## 4 11 11
## 5 12 4
## 6 15 4
## 7 16 4
## 8 18 3.5
## 9 19 12
## 10 20 4.5
## # … with 5,916 more rows
Looking at the structure of the dataset after this clean up, we see that besides the one factor variable ocean_proximity, we are left with nine numeric variables, three of which are continuous (longitude, latitude, and median_income) and six of which are discrete (housing_median_age, total_rooms, total_bedrooms, population, households, and median_house_value).
str(housing_data)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 20640 obs. of 10 variables:
## $ longitude : num -122 -122 -122 -122 -122 ...
## $ latitude : num 37.9 37.9 37.9 37.9 37.9 ...
## $ housing_median_age: num 41 21 52 52 52 52 52 52 42 52 ...
## $ total_rooms : num 880 7099 1467 1274 1627 ...
## $ total_bedrooms : num 129 1106 190 235 280 ...
## $ population : num 322 2401 496 558 565 ...
## $ households : num 126 1138 177 219 259 ...
## $ median_income : num 8.33 8.3 7.26 5.64 3.85 ...
## $ median_house_value: num 452600 358500 352100 341300 342200 ...
## $ ocean_proximity : Factor w/ 5 levels "<1H OCEAN","INLAND",..: 4 4 4 4 4 4 4 4 4 4 ...
## - attr(*, "spec")=
## .. cols(
## .. longitude = col_double(),
## .. latitude = col_double(),
## .. housing_median_age = col_double(),
## .. total_rooms = col_double(),
## .. total_bedrooms = col_double(),
## .. population = col_double(),
## .. households = col_double(),
## .. median_income = col_double(),
## .. median_house_value = col_double(),
## .. ocean_proximity = col_character()
## .. )
Let’s look a bit more closely at the distribution of the numeric variables.
par(mfrow = c(3, 3))
hist(housing_data$longitude, breaks = 20, main = "longitude", border="darkorange", col="dodgerblue")
hist(housing_data$latitude, breaks = 20, main = "latitude", border="darkorange", col="dodgerblue")
hist(housing_data$housing_median_age, breaks = 20, main = "housing_median_age", border="darkorange", col="dodgerblue")
hist(housing_data$total_rooms, breaks = 20, main = "total_rooms", border="darkorange", col="dodgerblue")
hist(housing_data$total_bedrooms, breaks = 20, main = "total_bedrooms", border="darkorange", col="dodgerblue")
hist(housing_data$population, breaks = 20, main = "population", border="darkorange", col="dodgerblue")
hist(housing_data$households, breaks = 20, main = "households", border="darkorange", col="dodgerblue")
hist(housing_data$median_income, breaks = 20, main = "median_income", border="darkorange", col="dodgerblue")
hist(housing_data$median_house_value, breaks = 20, main = "median_house_value", border="darkorange", col="dodgerblue")
And let’s look at the relationships between all the possible variables.
pairs(housing_data, col = "dodgerblue")
In addition to the already mentioned linear relationship between total rooms and total bedrooms, we will also need to look into potential coliearity of households and total bedrooms (and potentially total rooms).
library(ggplot2)
#we want to look at shape of distribution to get a good idea of what to impute
ggplot(housing_data, aes(x = total_bedrooms)) +
geom_histogram(bins = 40) +
xlab("Total Bedrooms") +
ylab("Density") +
ggtitle("Histogram of Total Bedrooms (noncontinuous variable)")
## Warning: Removed 207 rows containing non-finite values (stat_bin).
#using mean for now
library(mice)
##
## Attaching package: 'mice'
## The following object is masked from 'package:tidyr':
##
## complete
## The following objects are masked from 'package:base':
##
## cbind, rbind
housing_data_temp = mice(data = housing_data, m = 5, method = "mean", seed = 420)
##
## iter imp variable
## 1 1 total_bedrooms
## 1 2 total_bedrooms
## 1 3 total_bedrooms
## 1 4 total_bedrooms
## 1 5 total_bedrooms
## 2 1 total_bedrooms
## 2 2 total_bedrooms
## 2 3 total_bedrooms
## 2 4 total_bedrooms
## 2 5 total_bedrooms
## 3 1 total_bedrooms
## 3 2 total_bedrooms
## 3 3 total_bedrooms
## 3 4 total_bedrooms
## 3 5 total_bedrooms
## 4 1 total_bedrooms
## 4 2 total_bedrooms
## 4 3 total_bedrooms
## 4 4 total_bedrooms
## 4 5 total_bedrooms
## 5 1 total_bedrooms
## 5 2 total_bedrooms
## 5 3 total_bedrooms
## 5 4 total_bedrooms
## 5 5 total_bedrooms
housing_data_full = complete(housing_data_temp, 1)
housing_data_full$ocean_proximity = as.factor(housing_data_full$ocean_proximity)
housing_data_nc = housing_data_full[, -10]#remove text variable for now
corrmatrix = cor(housing_data_nc)
kable(t(corrmatrix))
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
|---|---|---|---|---|---|---|---|---|---|
| longitude | 1.0000000 | -0.9246644 | -0.1081968 | 0.0445680 | 0.0692597 | 0.0997732 | 0.0553101 | -0.0151759 | -0.0459666 |
| latitude | -0.9246644 | 1.0000000 | 0.0111727 | -0.0360996 | -0.0666584 | -0.1087847 | -0.0710354 | -0.0798091 | -0.1441603 |
| housing_median_age | -0.1081968 | 0.0111727 | 1.0000000 | -0.3612622 | -0.3189983 | -0.2962442 | -0.3029160 | -0.1190340 | 0.1056234 |
| total_rooms | 0.0445680 | -0.0360996 | -0.3612622 | 1.0000000 | 0.9272527 | 0.8571260 | 0.9184845 | 0.1980496 | 0.1341531 |
| total_bedrooms | 0.0692597 | -0.0666584 | -0.3189983 | 0.9272527 | 1.0000000 | 0.8739095 | 0.9747249 | -0.0076819 | 0.0494535 |
| population | 0.0997732 | -0.1087847 | -0.2962442 | 0.8571260 | 0.8739095 | 1.0000000 | 0.9072223 | 0.0048343 | -0.0246497 |
| households | 0.0553101 | -0.0710354 | -0.3029160 | 0.9184845 | 0.9747249 | 0.9072223 | 1.0000000 | 0.0130331 | 0.0658427 |
| median_income | -0.0151759 | -0.0798091 | -0.1190340 | 0.1980496 | -0.0076819 | 0.0048343 | 0.0130331 | 1.0000000 | 0.6880752 |
| median_house_value | -0.0459666 | -0.1441603 | 0.1056234 | 0.1341531 | 0.0494535 | -0.0246497 | 0.0658427 | 0.6880752 | 1.0000000 |
highcorr = findCorrelation(corrmatrix, cutoff = .60)#this will give you highly correlated variables
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(RColorBrewer)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_map = ggplot(housing_data_full,
aes(x = longitude, y = latitude, color = median_house_value, hma = housing_median_age,
tr = total_rooms, tb = total_bedrooms, hh = households, mi = median_income)) +
geom_point(aes(size = population), alpha = 0.4) +
xlab("Longitude") +
ylab("Latitude") +
ggtitle("Data Map - Longtitude vs Latitude and Associated Variables") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_color_distiller(palette = "Paired", labels = comma) +
labs(color = "Median House Value (in $USD)", size = "Population")
plot_map_tt = ggplotly(plot_map)
plot_map_tt
temp_housing_data = housing_data_full[housing_data_full$ocean_proximity != "ISLAND", ] #possibly consider removing #ISLAND promiximity homes. There are 5 in the dataset out of 20k observations.
start_mod = lm(median_house_value ~ (.)^2, data = temp_housing_data)
n = length(resid(start_mod))
back_bic_mod = step(start_mod, direction = "backward", k = log(n))
## Start: AIC=456726.7
## median_house_value ~ (longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity)^2
##
## Df Sum of Sq RSS AIC
## - total_bedrooms:ocean_proximity 3 3.3966e+10 8.2016e+13 456705
## - total_rooms:total_bedrooms 1 9.1245e+07 8.1982e+13 456717
## - housing_median_age:total_bedrooms 1 9.9335e+08 8.1983e+13 456717
## - households:ocean_proximity 3 8.0876e+10 8.2063e+13 456717
## - total_bedrooms:median_income 1 2.3709e+09 8.1985e+13 456717
## - longitude:households 1 1.8557e+10 8.2001e+13 456721
## - total_bedrooms:population 1 2.3950e+10 8.2006e+13 456723
## - latitude:households 1 2.6834e+10 8.2009e+13 456723
## - longitude:population 1 2.7677e+10 8.2010e+13 456724
## <none> 8.1982e+13 456727
## - households:median_income 1 4.3377e+10 8.2026e+13 456728
## - latitude:population 1 5.3803e+10 8.2036e+13 456730
## - longitude:total_bedrooms 1 6.1010e+10 8.2043e+13 456732
## - population:median_income 1 6.4331e+10 8.2047e+13 456733
## - total_rooms:ocean_proximity 3 1.4739e+11 8.2130e+13 456734
## - housing_median_age:total_rooms 1 7.6819e+10 8.2059e+13 456736
## - housing_median_age:median_income 1 8.7309e+10 8.2070e+13 456739
## - total_rooms:households 1 9.1399e+10 8.2074e+13 456740
## - population:households 1 9.2161e+10 8.2074e+13 456740
## - latitude:total_bedrooms 1 9.8267e+10 8.2080e+13 456741
## - housing_median_age:ocean_proximity 3 1.8076e+11 8.2163e+13 456742
## - longitude:total_rooms 1 1.2431e+11 8.2107e+13 456748
## - total_rooms:median_income 1 1.3340e+11 8.2116e+13 456750
## - latitude:total_rooms 1 1.5767e+11 8.2140e+13 456756
## - total_rooms:population 1 1.8232e+11 8.2165e+13 456763
## - median_income:ocean_proximity 3 3.1068e+11 8.2293e+13 456775
## - population:ocean_proximity 3 3.8433e+11 8.2367e+13 456793
## - longitude:latitude 1 3.9347e+11 8.2376e+13 456816
## - total_bedrooms:households 1 3.9597e+11 8.2378e+13 456816
## - housing_median_age:households 1 4.2291e+11 8.2405e+13 456823
## - longitude:housing_median_age 1 4.5976e+11 8.2442e+13 456832
## - latitude:housing_median_age 1 6.3099e+11 8.2613e+13 456875
## - longitude:median_income 1 6.5260e+11 8.2635e+13 456880
## - latitude:median_income 1 7.2664e+11 8.2709e+13 456899
## - housing_median_age:population 1 9.7874e+11 8.2961e+13 456962
## - latitude:ocean_proximity 3 1.2166e+12 8.3199e+13 457001
## - longitude:ocean_proximity 3 2.1565e+12 8.4139e+13 457233
##
## Step: AIC=456705.4
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:population +
## longitude:households + longitude:median_income + longitude:ocean_proximity +
## latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms +
## latitude:population + latitude:households + latitude:median_income +
## latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:total_bedrooms + housing_median_age:population +
## housing_median_age:households + housing_median_age:median_income +
## housing_median_age:ocean_proximity + total_rooms:total_bedrooms +
## total_rooms:population + total_rooms:households + total_rooms:median_income +
## total_rooms:ocean_proximity + total_bedrooms:population +
## total_bedrooms:households + total_bedrooms:median_income +
## population:households + population:median_income + population:ocean_proximity +
## households:median_income + households:ocean_proximity + median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - total_rooms:total_bedrooms 1 1.1063e+09 8.2017e+13 456696
## - housing_median_age:total_bedrooms 1 2.1540e+09 8.2018e+13 456696
## - total_bedrooms:median_income 1 2.9676e+09 8.2019e+13 456696
## - households:ocean_proximity 3 8.7377e+10 8.2104e+13 456698
## - total_bedrooms:population 1 1.3989e+10 8.2030e+13 456699
## - longitude:households 1 2.6159e+10 8.2042e+13 456702
## - longitude:population 1 2.7868e+10 8.2044e+13 456702
## - latitude:households 1 3.7451e+10 8.2054e+13 456705
## <none> 8.2016e+13 456705
## - households:median_income 1 4.5228e+10 8.2061e+13 456707
## - latitude:population 1 5.5671e+10 8.2072e+13 456709
## - population:median_income 1 6.4610e+10 8.2081e+13 456712
## - housing_median_age:total_rooms 1 7.3952e+10 8.2090e+13 456714
## - total_rooms:ocean_proximity 3 1.6112e+11 8.2177e+13 456716
## - housing_median_age:median_income 1 8.3276e+10 8.2099e+13 456716
## - total_rooms:households 1 8.8196e+10 8.2104e+13 456718
## - housing_median_age:ocean_proximity 3 1.7426e+11 8.2190e+13 456719
## - longitude:total_bedrooms 1 9.5540e+10 8.2112e+13 456720
## - longitude:total_rooms 1 1.3119e+11 8.2147e+13 456728
## - total_rooms:median_income 1 1.3250e+11 8.2149e+13 456729
## - population:households 1 1.4602e+11 8.2162e+13 456732
## - latitude:total_bedrooms 1 1.5925e+11 8.2175e+13 456736
## - latitude:total_rooms 1 1.7523e+11 8.2191e+13 456740
## - total_rooms:population 1 1.8910e+11 8.2205e+13 456743
## - median_income:ocean_proximity 3 3.0915e+11 8.2325e+13 456753
## - population:ocean_proximity 3 3.6036e+11 8.2377e+13 456766
## - longitude:latitude 1 3.9532e+11 8.2412e+13 456795
## - total_bedrooms:households 1 4.1177e+11 8.2428e+13 456799
## - housing_median_age:households 1 4.5132e+11 8.2468e+13 456809
## - longitude:housing_median_age 1 4.6022e+11 8.2476e+13 456811
## - latitude:housing_median_age 1 6.3264e+11 8.2649e+13 456854
## - longitude:median_income 1 6.8519e+11 8.2701e+13 456867
## - latitude:median_income 1 7.6876e+11 8.2785e+13 456888
## - housing_median_age:population 1 9.8590e+11 8.3002e+13 456942
## - latitude:ocean_proximity 3 1.2232e+12 8.3239e+13 456981
## - longitude:ocean_proximity 3 2.1653e+12 8.4181e+13 457213
##
## Step: AIC=456695.8
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:population +
## longitude:households + longitude:median_income + longitude:ocean_proximity +
## latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms +
## latitude:population + latitude:households + latitude:median_income +
## latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:total_bedrooms + housing_median_age:population +
## housing_median_age:households + housing_median_age:median_income +
## housing_median_age:ocean_proximity + total_rooms:population +
## total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity +
## total_bedrooms:population + total_bedrooms:households + total_bedrooms:median_income +
## population:households + population:median_income + population:ocean_proximity +
## households:median_income + households:ocean_proximity + median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - total_bedrooms:median_income 1 2.5553e+09 8.2020e+13 456686
## - housing_median_age:total_bedrooms 1 3.1234e+09 8.2020e+13 456687
## - households:ocean_proximity 3 8.7733e+10 8.2105e+13 456688
## - total_bedrooms:population 1 2.5262e+10 8.2043e+13 456692
## - longitude:population 1 2.8284e+10 8.2046e+13 456693
## - longitude:households 1 3.1457e+10 8.2049e+13 456694
## <none> 8.2017e+13 456696
## - latitude:households 1 4.0683e+10 8.2058e+13 456696
## - households:median_income 1 4.4212e+10 8.2062e+13 456697
## - latitude:population 1 5.6079e+10 8.2073e+13 456700
## - population:median_income 1 6.5561e+10 8.2083e+13 456702
## - housing_median_age:total_rooms 1 7.4381e+10 8.2092e+13 456705
## - total_rooms:ocean_proximity 3 1.6041e+11 8.2178e+13 456706
## - housing_median_age:median_income 1 8.3430e+10 8.2101e+13 456707
## - housing_median_age:ocean_proximity 3 1.7407e+11 8.2191e+13 456710
## - longitude:total_bedrooms 1 9.5982e+10 8.2113e+13 456710
## - total_rooms:median_income 1 1.3217e+11 8.2149e+13 456719
## - longitude:total_rooms 1 1.3427e+11 8.2152e+13 456720
## - latitude:total_bedrooms 1 1.5834e+11 8.2176e+13 456726
## - total_rooms:households 1 1.7352e+11 8.2191e+13 456729
## - latitude:total_rooms 1 1.7765e+11 8.2195e+13 456730
## - total_rooms:population 1 1.8828e+11 8.2206e+13 456733
## - median_income:ocean_proximity 3 3.0811e+11 8.2325e+13 456743
## - population:households 1 2.5710e+11 8.2274e+13 456750
## - population:ocean_proximity 3 3.5958e+11 8.2377e+13 456756
## - longitude:latitude 1 3.9452e+11 8.2412e+13 456785
## - total_bedrooms:households 1 4.1068e+11 8.2428e+13 456789
## - longitude:housing_median_age 1 4.5931e+11 8.2477e+13 456801
## - housing_median_age:households 1 4.8648e+11 8.2504e+13 456808
## - latitude:housing_median_age 1 6.3165e+11 8.2649e+13 456844
## - longitude:median_income 1 6.8719e+11 8.2704e+13 456858
## - latitude:median_income 1 7.7050e+11 8.2788e+13 456879
## - housing_median_age:population 1 9.8483e+11 8.3002e+13 456932
## - latitude:ocean_proximity 3 1.2224e+12 8.3240e+13 456971
## - longitude:ocean_proximity 3 2.1642e+12 8.4181e+13 457203
##
## Step: AIC=456686.5
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:population +
## longitude:households + longitude:median_income + longitude:ocean_proximity +
## latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms +
## latitude:population + latitude:households + latitude:median_income +
## latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:total_bedrooms + housing_median_age:population +
## housing_median_age:households + housing_median_age:median_income +
## housing_median_age:ocean_proximity + total_rooms:population +
## total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity +
## total_bedrooms:population + total_bedrooms:households + population:households +
## population:median_income + population:ocean_proximity + households:median_income +
## households:ocean_proximity + median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - housing_median_age:total_bedrooms 1 2.0299e+09 8.2022e+13 456677
## - households:ocean_proximity 3 8.7069e+10 8.2107e+13 456679
## - total_bedrooms:population 1 2.7022e+10 8.2047e+13 456683
## - longitude:population 1 2.8545e+10 8.2048e+13 456684
## - longitude:households 1 3.3347e+10 8.2053e+13 456685
## <none> 8.2020e+13 456686
## - latitude:households 1 4.3781e+10 8.2064e+13 456688
## - latitude:population 1 5.6298e+10 8.2076e+13 456691
## - population:median_income 1 6.3082e+10 8.2083e+13 456692
## - housing_median_age:total_rooms 1 7.3394e+10 8.2093e+13 456695
## - households:median_income 1 7.5800e+10 8.2096e+13 456696
## - total_rooms:ocean_proximity 3 1.5887e+11 8.2179e+13 456697
## - housing_median_age:median_income 1 8.3562e+10 8.2103e+13 456698
## - housing_median_age:ocean_proximity 3 1.7339e+11 8.2193e+13 456700
## - longitude:total_bedrooms 1 9.4292e+10 8.2114e+13 456700
## - longitude:total_rooms 1 1.3173e+11 8.2152e+13 456710
## - total_rooms:median_income 1 1.3243e+11 8.2152e+13 456710
## - latitude:total_bedrooms 1 1.5895e+11 8.2179e+13 456716
## - total_rooms:households 1 1.7110e+11 8.2191e+13 456720
## - latitude:total_rooms 1 1.7511e+11 8.2195e+13 456721
## - total_rooms:population 1 1.8608e+11 8.2206e+13 456723
## - median_income:ocean_proximity 3 3.0580e+11 8.2326e+13 456733
## - population:households 1 2.5813e+11 8.2278e+13 456741
## - population:ocean_proximity 3 3.5744e+11 8.2377e+13 456746
## - longitude:latitude 1 3.9378e+11 8.2414e+13 456775
## - total_bedrooms:households 1 4.0815e+11 8.2428e+13 456779
## - longitude:housing_median_age 1 4.6132e+11 8.2481e+13 456792
## - housing_median_age:households 1 4.9580e+11 8.2516e+13 456801
## - latitude:housing_median_age 1 6.3387e+11 8.2654e+13 456835
## - longitude:median_income 1 6.9069e+11 8.2711e+13 456850
## - latitude:median_income 1 7.7281e+11 8.2793e+13 456870
## - housing_median_age:population 1 9.8521e+11 8.3005e+13 456923
## - latitude:ocean_proximity 3 1.2204e+12 8.3240e+13 456961
## - longitude:ocean_proximity 3 2.1626e+12 8.4182e+13 457194
##
## Step: AIC=456677
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:population +
## longitude:households + longitude:median_income + longitude:ocean_proximity +
## latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms +
## latitude:population + latitude:households + latitude:median_income +
## latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:population + housing_median_age:households +
## housing_median_age:median_income + housing_median_age:ocean_proximity +
## total_rooms:population + total_rooms:households + total_rooms:median_income +
## total_rooms:ocean_proximity + total_bedrooms:population +
## total_bedrooms:households + population:households + population:median_income +
## population:ocean_proximity + households:median_income + households:ocean_proximity +
## median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - households:ocean_proximity 3 8.6819e+10 8.2109e+13 456669
## - longitude:population 1 2.8898e+10 8.2051e+13 456674
## - total_bedrooms:population 1 3.2757e+10 8.2055e+13 456675
## - longitude:households 1 3.7782e+10 8.2060e+13 456677
## <none> 8.2022e+13 456677
## - latitude:households 1 4.7102e+10 8.2069e+13 456679
## - latitude:population 1 5.6585e+10 8.2078e+13 456681
## - population:median_income 1 6.5167e+10 8.2087e+13 456684
## - households:median_income 1 7.6714e+10 8.2099e+13 456686
## - total_rooms:ocean_proximity 3 1.5800e+11 8.2180e+13 456687
## - housing_median_age:total_rooms 1 9.0793e+10 8.2113e+13 456690
## - longitude:total_bedrooms 1 9.3476e+10 8.2115e+13 456691
## - housing_median_age:median_income 1 9.4407e+10 8.2116e+13 456691
## - housing_median_age:ocean_proximity 3 1.7542e+11 8.2197e+13 456691
## - longitude:total_rooms 1 1.3060e+11 8.2152e+13 456700
## - total_rooms:median_income 1 1.3267e+11 8.2155e+13 456700
## - latitude:total_bedrooms 1 1.5781e+11 8.2180e+13 456707
## - total_rooms:households 1 1.6984e+11 8.2192e+13 456710
## - latitude:total_rooms 1 1.7370e+11 8.2196e+13 456711
## - total_rooms:population 1 1.8733e+11 8.2209e+13 456714
## - median_income:ocean_proximity 3 3.0455e+11 8.2326e+13 456724
## - population:households 1 2.7241e+11 8.2294e+13 456736
## - population:ocean_proximity 3 3.5668e+11 8.2379e+13 456737
## - longitude:latitude 1 3.9416e+11 8.2416e+13 456766
## - total_bedrooms:households 1 4.0745e+11 8.2429e+13 456769
## - longitude:housing_median_age 1 4.6683e+11 8.2489e+13 456784
## - latitude:housing_median_age 1 6.3894e+11 8.2661e+13 456827
## - longitude:median_income 1 6.8874e+11 8.2711e+13 456840
## - latitude:median_income 1 7.7086e+11 8.2793e+13 456860
## - housing_median_age:population 1 1.0009e+12 8.3023e+13 456917
## - latitude:ocean_proximity 3 1.2208e+12 8.3243e+13 456952
## - housing_median_age:households 1 1.2775e+12 8.3299e+13 456986
## - longitude:ocean_proximity 3 2.1610e+12 8.4183e+13 457184
##
## Step: AIC=456669.1
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:population +
## longitude:households + longitude:median_income + longitude:ocean_proximity +
## latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms +
## latitude:population + latitude:households + latitude:median_income +
## latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:population + housing_median_age:households +
## housing_median_age:median_income + housing_median_age:ocean_proximity +
## total_rooms:population + total_rooms:households + total_rooms:median_income +
## total_rooms:ocean_proximity + total_bedrooms:population +
## total_bedrooms:households + population:households + population:median_income +
## population:ocean_proximity + households:median_income + median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - longitude:population 1 1.7928e+10 8.2127e+13 456664
## - longitude:households 1 2.4690e+10 8.2133e+13 456665
## - total_bedrooms:population 1 3.2510e+10 8.2141e+13 456667
## <none> 8.2109e+13 456669
## - latitude:households 1 4.4200e+10 8.2153e+13 456670
## - latitude:population 1 4.8712e+10 8.2157e+13 456671
## - population:median_income 1 7.4432e+10 8.2183e+13 456678
## - longitude:total_bedrooms 1 8.3137e+10 8.2192e+13 456680
## - households:median_income 1 8.7278e+10 8.2196e+13 456681
## - housing_median_age:total_rooms 1 8.8601e+10 8.2197e+13 456681
## - housing_median_age:median_income 1 9.5562e+10 8.2204e+13 456683
## - housing_median_age:ocean_proximity 3 1.8170e+11 8.2290e+13 456685
## - longitude:total_rooms 1 1.2967e+11 8.2238e+13 456692
## - total_rooms:median_income 1 1.3273e+11 8.2241e+13 456692
## - latitude:total_bedrooms 1 1.5242e+11 8.2261e+13 456697
## - total_rooms:households 1 1.6981e+11 8.2279e+13 456702
## - total_rooms:population 1 1.9097e+11 8.2300e+13 456707
## - latitude:total_rooms 1 1.9999e+11 8.2309e+13 456709
## - total_rooms:ocean_proximity 3 2.8796e+11 8.2397e+13 456712
## - population:households 1 2.7898e+11 8.2388e+13 456729
## - population:ocean_proximity 3 4.0867e+11 8.2517e+13 456742
## - median_income:ocean_proximity 3 4.1706e+11 8.2526e+13 456744
## - longitude:latitude 1 3.8852e+11 8.2497e+13 456757
## - total_bedrooms:households 1 4.0019e+11 8.2509e+13 456759
## - longitude:housing_median_age 1 4.7191e+11 8.2581e+13 456777
## - latitude:housing_median_age 1 6.4854e+11 8.2757e+13 456821
## - longitude:median_income 1 6.9310e+11 8.2802e+13 456833
## - latitude:median_income 1 7.9933e+11 8.2908e+13 456859
## - housing_median_age:population 1 1.0055e+12 8.3114e+13 456910
## - latitude:ocean_proximity 3 1.2363e+12 8.3345e+13 456948
## - housing_median_age:households 1 1.3227e+12 8.3431e+13 456989
## - longitude:ocean_proximity 3 2.1824e+12 8.4291e+13 457181
##
## Step: AIC=456663.7
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:households +
## longitude:median_income + longitude:ocean_proximity + latitude:housing_median_age +
## latitude:total_rooms + latitude:total_bedrooms + latitude:population +
## latitude:households + latitude:median_income + latitude:ocean_proximity +
## housing_median_age:total_rooms + housing_median_age:population +
## housing_median_age:households + housing_median_age:median_income +
## housing_median_age:ocean_proximity + total_rooms:population +
## total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity +
## total_bedrooms:population + total_bedrooms:households + population:households +
## population:median_income + population:ocean_proximity + households:median_income +
## median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - longitude:households 1 1.0557e+10 8.2137e+13 456656
## - total_bedrooms:population 1 2.5196e+10 8.2152e+13 456660
## - latitude:households 1 2.6992e+10 8.2154e+13 456660
## <none> 8.2127e+13 456664
## - population:median_income 1 8.2988e+10 8.2210e+13 456675
## - latitude:population 1 8.6665e+10 8.2213e+13 456675
## - housing_median_age:total_rooms 1 9.4969e+10 8.2222e+13 456678
## - households:median_income 1 9.7593e+10 8.2224e+13 456678
## - housing_median_age:median_income 1 1.0014e+11 8.2227e+13 456679
## - housing_median_age:ocean_proximity 3 1.8124e+11 8.2308e+13 456679
## - longitude:total_bedrooms 1 1.0440e+11 8.2231e+13 456680
## - total_rooms:median_income 1 1.3046e+11 8.2257e+13 456686
## - total_rooms:households 1 1.5955e+11 8.2286e+13 456694
## - total_rooms:population 1 1.7940e+11 8.2306e+13 456699
## - latitude:total_bedrooms 1 1.8561e+11 8.2312e+13 456700
## - longitude:total_rooms 1 2.0231e+11 8.2329e+13 456704
## - latitude:total_rooms 1 2.9682e+11 8.2423e+13 456728
## - population:households 1 2.9729e+11 8.2424e+13 456728
## - median_income:ocean_proximity 3 4.2540e+11 8.2552e+13 456740
## - total_bedrooms:households 1 3.8552e+11 8.2512e+13 456750
## - longitude:latitude 1 3.8593e+11 8.2513e+13 456750
## - total_rooms:ocean_proximity 3 4.8757e+11 8.2614e+13 456756
## - longitude:housing_median_age 1 4.6704e+11 8.2594e+13 456771
## - population:ocean_proximity 3 7.1419e+11 8.2841e+13 456813
## - latitude:housing_median_age 1 6.4266e+11 8.2769e+13 456815
## - longitude:median_income 1 7.1390e+11 8.2841e+13 456832
## - latitude:median_income 1 8.1651e+11 8.2943e+13 456858
## - housing_median_age:population 1 1.0090e+12 8.3136e+13 456906
## - latitude:ocean_proximity 3 1.2450e+12 8.3372e+13 456944
## - housing_median_age:households 1 1.3474e+12 8.3474e+13 456990
## - longitude:ocean_proximity 3 2.1964e+12 8.4323e+13 457178
##
## Step: AIC=456656.4
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:median_income +
## longitude:ocean_proximity + latitude:housing_median_age +
## latitude:total_rooms + latitude:total_bedrooms + latitude:population +
## latitude:households + latitude:median_income + latitude:ocean_proximity +
## housing_median_age:total_rooms + housing_median_age:population +
## housing_median_age:households + housing_median_age:median_income +
## housing_median_age:ocean_proximity + total_rooms:population +
## total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity +
## total_bedrooms:population + total_bedrooms:households + population:households +
## population:median_income + population:ocean_proximity + households:median_income +
## median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - total_bedrooms:population 1 1.9534e+10 8.2157e+13 456651
## - latitude:households 1 2.1573e+10 8.2159e+13 456652
## <none> 8.2137e+13 456656
## - population:median_income 1 8.4172e+10 8.2221e+13 456668
## - latitude:population 1 9.8545e+10 8.2236e+13 456671
## - households:median_income 1 1.0403e+11 8.2241e+13 456673
## - housing_median_age:ocean_proximity 3 1.8748e+11 8.2325e+13 456674
## - housing_median_age:median_income 1 1.0984e+11 8.2247e+13 456674
## - housing_median_age:total_rooms 1 1.1277e+11 8.2250e+13 456675
## - total_rooms:median_income 1 1.2353e+11 8.2261e+13 456677
## - total_rooms:households 1 1.6280e+11 8.2300e+13 456687
## - longitude:total_bedrooms 1 1.7721e+11 8.2314e+13 456691
## - total_rooms:population 1 1.8512e+11 8.2322e+13 456693
## - longitude:total_rooms 1 1.9234e+11 8.2330e+13 456695
## - latitude:total_bedrooms 1 2.5537e+11 8.2393e+13 456710
## - latitude:total_rooms 1 2.8851e+11 8.2426e+13 456719
## - median_income:ocean_proximity 3 4.2590e+11 8.2563e+13 456733
## - total_bedrooms:households 1 3.8589e+11 8.2523e+13 456743
## - longitude:latitude 1 3.9955e+11 8.2537e+13 456747
## - total_rooms:ocean_proximity 3 4.7908e+11 8.2616e+13 456747
## - population:households 1 4.0817e+11 8.2545e+13 456749
## - longitude:housing_median_age 1 4.6702e+11 8.2604e+13 456763
## - latitude:housing_median_age 1 6.4257e+11 8.2780e+13 456807
## - population:ocean_proximity 3 7.3062e+11 8.2868e+13 456809
## - longitude:median_income 1 7.0672e+11 8.2844e+13 456823
## - latitude:median_income 1 8.0895e+11 8.2946e+13 456849
## - housing_median_age:population 1 1.0220e+12 8.3159e+13 456902
## - latitude:ocean_proximity 3 1.2725e+12 8.3410e+13 456944
## - housing_median_age:households 1 1.4900e+12 8.3627e+13 457017
## - longitude:ocean_proximity 3 2.2268e+12 8.4364e+13 457179
##
## Step: AIC=456651.3
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:median_income +
## longitude:ocean_proximity + latitude:housing_median_age +
## latitude:total_rooms + latitude:total_bedrooms + latitude:population +
## latitude:households + latitude:median_income + latitude:ocean_proximity +
## housing_median_age:total_rooms + housing_median_age:population +
## housing_median_age:households + housing_median_age:median_income +
## housing_median_age:ocean_proximity + total_rooms:population +
## total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity +
## total_bedrooms:households + population:households + population:median_income +
## population:ocean_proximity + households:median_income + median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## - latitude:households 1 2.1726e+10 8.2178e+13 456647
## <none> 8.2157e+13 456651
## - latitude:population 1 1.1018e+11 8.2267e+13 456669
## - housing_median_age:median_income 1 1.1078e+11 8.2267e+13 456669
## - housing_median_age:ocean_proximity 3 1.9057e+11 8.2347e+13 456669
## - housing_median_age:total_rooms 1 1.1619e+11 8.2273e+13 456671
## - households:median_income 1 1.2251e+11 8.2279e+13 456672
## - total_rooms:median_income 1 1.2384e+11 8.2281e+13 456672
## - population:median_income 1 1.3242e+11 8.2289e+13 456675
## - longitude:total_bedrooms 1 1.7604e+11 8.2333e+13 456686
## - longitude:total_rooms 1 1.8986e+11 8.2347e+13 456689
## - total_rooms:households 1 1.9171e+11 8.2348e+13 456689
## - latitude:total_bedrooms 1 2.5534e+11 8.2412e+13 456705
## - total_rooms:population 1 2.6676e+11 8.2423e+13 456708
## - latitude:total_rooms 1 2.8390e+11 8.2441e+13 456713
## - median_income:ocean_proximity 3 4.2500e+11 8.2582e+13 456728
## - longitude:latitude 1 3.9509e+11 8.2552e+13 456740
## - total_rooms:ocean_proximity 3 4.7901e+11 8.2636e+13 456741
## - longitude:housing_median_age 1 4.6885e+11 8.2626e+13 456759
## - latitude:housing_median_age 1 6.4334e+11 8.2800e+13 456802
## - population:ocean_proximity 3 7.2775e+11 8.2884e+13 456804
## - population:households 1 6.7935e+11 8.2836e+13 456811
## - total_bedrooms:households 1 6.8621e+11 8.2843e+13 456813
## - longitude:median_income 1 7.0865e+11 8.2865e+13 456819
## - latitude:median_income 1 8.1094e+11 8.2968e+13 456844
## - housing_median_age:population 1 1.0106e+12 8.3167e+13 456894
## - latitude:ocean_proximity 3 1.2720e+12 8.3429e+13 456939
## - housing_median_age:households 1 1.4778e+12 8.3635e+13 457009
## - longitude:ocean_proximity 3 2.2223e+12 8.4379e+13 457172
##
## Step: AIC=456646.9
## median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:median_income +
## longitude:ocean_proximity + latitude:housing_median_age +
## latitude:total_rooms + latitude:total_bedrooms + latitude:population +
## latitude:median_income + latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:population + housing_median_age:households +
## housing_median_age:median_income + housing_median_age:ocean_proximity +
## total_rooms:population + total_rooms:households + total_rooms:median_income +
## total_rooms:ocean_proximity + total_bedrooms:households +
## population:households + population:median_income + population:ocean_proximity +
## households:median_income + median_income:ocean_proximity
##
## Df Sum of Sq RSS AIC
## <none> 8.2178e+13 456647
## - latitude:population 1 8.9938e+10 8.2268e+13 456659
## - housing_median_age:total_rooms 1 1.0965e+11 8.2288e+13 456664
## - housing_median_age:median_income 1 1.0979e+11 8.2288e+13 456664
## - housing_median_age:ocean_proximity 3 1.9301e+11 8.2371e+13 456665
## - total_rooms:median_income 1 1.1791e+11 8.2296e+13 456667
## - households:median_income 1 1.1897e+11 8.2297e+13 456667
## - population:median_income 1 1.2010e+11 8.2299e+13 456667
## - total_rooms:households 1 1.8960e+11 8.2368e+13 456684
## - longitude:total_bedrooms 1 1.9646e+11 8.2375e+13 456686
## - longitude:total_rooms 1 2.0465e+11 8.2383e+13 456688
## - total_rooms:population 1 2.7003e+11 8.2448e+13 456705
## - latitude:total_rooms 1 3.0622e+11 8.2485e+13 456714
## - median_income:ocean_proximity 3 4.3223e+11 8.2611e+13 456725
## - latitude:total_bedrooms 1 4.0071e+11 8.2579e+13 456737
## - longitude:latitude 1 4.1726e+11 8.2596e+13 456741
## - total_rooms:ocean_proximity 3 5.0035e+11 8.2679e+13 456742
## - longitude:housing_median_age 1 4.7676e+11 8.2655e+13 456756
## - latitude:housing_median_age 1 6.5598e+11 8.2834e+13 456801
## - total_bedrooms:households 1 6.7859e+11 8.2857e+13 456807
## - population:households 1 6.8453e+11 8.2863e+13 456808
## - population:ocean_proximity 3 7.7513e+11 8.2954e+13 456811
## - longitude:median_income 1 7.1905e+11 8.2897e+13 456817
## - latitude:median_income 1 8.2539e+11 8.3004e+13 456843
## - housing_median_age:population 1 9.9965e+11 8.3178e+13 456886
## - latitude:ocean_proximity 3 1.2632e+12 8.3442e+13 456932
## - housing_median_age:households 1 1.4589e+12 8.3637e+13 457000
## - longitude:ocean_proximity 3 2.2125e+12 8.4391e+13 457165
summary(back_bic_mod)
##
## Call:
## lm(formula = median_house_value ~ longitude + latitude + housing_median_age +
## total_rooms + total_bedrooms + population + households +
## median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age +
## longitude:total_rooms + longitude:total_bedrooms + longitude:median_income +
## longitude:ocean_proximity + latitude:housing_median_age +
## latitude:total_rooms + latitude:total_bedrooms + latitude:population +
## latitude:median_income + latitude:ocean_proximity + housing_median_age:total_rooms +
## housing_median_age:population + housing_median_age:households +
## housing_median_age:median_income + housing_median_age:ocean_proximity +
## total_rooms:population + total_rooms:households + total_rooms:median_income +
## total_rooms:ocean_proximity + total_bedrooms:households +
## population:households + population:median_income + population:ocean_proximity +
## households:median_income + median_income:ocean_proximity,
## data = temp_housing_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -471806 -36852 -8923 25350 616260
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -7.296e+06 9.152e+05 -7.972
## longitude -5.241e+04 8.032e+03 -6.525
## latitude 2.929e+05 2.736e+04 10.705
## housing_median_age -7.643e+04 7.648e+03 -9.993
## total_rooms 6.752e+02 1.077e+02 6.267
## total_bedrooms -2.750e+03 5.218e+02 -5.271
## population -1.033e+02 1.829e+01 -5.651
## households -1.830e+02 1.618e+01 -11.309
## median_income -7.382e+05 6.002e+04 -12.300
## ocean_proximityINLAND 7.288e+05 2.479e+05 2.939
## ocean_proximityNEAR BAY -2.423e+07 1.239e+06 -19.561
## ocean_proximityNEAR OCEAN -7.422e+05 3.102e+05 -2.392
## longitude:latitude 2.179e+03 2.131e+02 10.223
## longitude:housing_median_age -9.658e+02 8.838e+01 -10.928
## longitude:total_rooms 8.930e+00 1.247e+00 7.160
## longitude:total_bedrooms -4.255e+01 6.066e+00 -7.015
## longitude:median_income -9.513e+03 7.088e+02 -13.420
## longitude:ocean_proximityINLAND 1.116e+04 2.949e+03 3.785
## longitude:ocean_proximityNEAR BAY -2.384e+05 1.092e+04 -21.835
## longitude:ocean_proximityNEAR OCEAN -5.991e+03 3.701e+03 -1.619
## latitude:housing_median_age -1.114e+03 8.688e+01 -12.818
## latitude:total_rooms 1.101e+01 1.257e+00 8.758
## latitude:total_bedrooms -6.141e+01 6.130e+00 -10.019
## latitude:population 2.513e+00 5.294e-01 4.746
## latitude:median_income -1.053e+04 7.320e+02 -14.379
## latitude:ocean_proximityINLAND 1.381e+04 3.079e+03 4.485
## latitude:ocean_proximityNEAR BAY -1.298e+05 8.441e+03 -15.372
## latitude:ocean_proximityNEAR OCEAN 1.057e+02 3.864e+03 0.027
## housing_median_age:total_rooms -3.385e-01 6.459e-02 -5.241
## housing_median_age:population -1.488e+00 9.405e-02 -15.824
## housing_median_age:households 7.216e+00 3.775e-01 19.116
## housing_median_age:median_income 1.218e+02 2.322e+01 5.244
## housing_median_age:ocean_proximityINLAND 9.414e+02 1.453e+02 6.479
## housing_median_age:ocean_proximityNEAR BAY -9.824e+01 1.608e+02 -0.611
## housing_median_age:ocean_proximityNEAR OCEAN 2.189e+02 1.324e+02 1.653
## total_rooms:population -3.332e-03 4.051e-04 -8.224
## total_rooms:households 8.135e-03 1.180e-03 6.891
## total_rooms:median_income 1.553e+00 2.857e-01 5.435
## total_rooms:ocean_proximityINLAND -1.346e+01 1.331e+00 -10.113
## total_rooms:ocean_proximityNEAR BAY 6.043e+00 2.412e+00 2.505
## total_rooms:ocean_proximityNEAR OCEAN -2.618e+00 1.491e+00 -1.756
## total_bedrooms:households -6.688e-02 5.130e-03 -13.037
## population:households 2.742e-02 2.094e-03 13.094
## population:median_income -3.624e+00 6.607e-01 -5.485
## population:ocean_proximityINLAND 2.963e+01 2.360e+00 12.553
## population:ocean_proximityNEAR BAY -1.362e+01 4.772e+00 -2.854
## population:ocean_proximityNEAR OCEAN 8.383e+00 2.852e+00 2.939
## households:median_income 1.382e+01 2.532e+00 5.459
## median_income:ocean_proximityINLAND 1.019e+04 1.058e+03 9.633
## median_income:ocean_proximityNEAR BAY 2.301e+03 1.031e+03 2.232
## median_income:ocean_proximityNEAR OCEAN 3.432e+03 8.250e+02 4.160
## Pr(>|t|)
## (Intercept) 1.64e-15 ***
## longitude 6.95e-11 ***
## latitude < 2e-16 ***
## housing_median_age < 2e-16 ***
## total_rooms 3.75e-10 ***
## total_bedrooms 1.37e-07 ***
## population 1.62e-08 ***
## households < 2e-16 ***
## median_income < 2e-16 ***
## ocean_proximityINLAND 0.003292 **
## ocean_proximityNEAR BAY < 2e-16 ***
## ocean_proximityNEAR OCEAN 0.016756 *
## longitude:latitude < 2e-16 ***
## longitude:housing_median_age < 2e-16 ***
## longitude:total_rooms 8.36e-13 ***
## longitude:total_bedrooms 2.37e-12 ***
## longitude:median_income < 2e-16 ***
## longitude:ocean_proximityINLAND 0.000154 ***
## longitude:ocean_proximityNEAR BAY < 2e-16 ***
## longitude:ocean_proximityNEAR OCEAN 0.105494
## latitude:housing_median_age < 2e-16 ***
## latitude:total_rooms < 2e-16 ***
## latitude:total_bedrooms < 2e-16 ***
## latitude:population 2.09e-06 ***
## latitude:median_income < 2e-16 ***
## latitude:ocean_proximityINLAND 7.31e-06 ***
## latitude:ocean_proximityNEAR BAY < 2e-16 ***
## latitude:ocean_proximityNEAR OCEAN 0.978167
## housing_median_age:total_rooms 1.61e-07 ***
## housing_median_age:population < 2e-16 ***
## housing_median_age:households < 2e-16 ***
## housing_median_age:median_income 1.59e-07 ***
## housing_median_age:ocean_proximityINLAND 9.43e-11 ***
## housing_median_age:ocean_proximityNEAR BAY 0.541162
## housing_median_age:ocean_proximityNEAR OCEAN 0.098260 .
## total_rooms:population < 2e-16 ***
## total_rooms:households 5.69e-12 ***
## total_rooms:median_income 5.55e-08 ***
## total_rooms:ocean_proximityINLAND < 2e-16 ***
## total_rooms:ocean_proximityNEAR BAY 0.012240 *
## total_rooms:ocean_proximityNEAR OCEAN 0.079153 .
## total_bedrooms:households < 2e-16 ***
## population:households < 2e-16 ***
## population:median_income 4.19e-08 ***
## population:ocean_proximityINLAND < 2e-16 ***
## population:ocean_proximityNEAR BAY 0.004322 **
## population:ocean_proximityNEAR OCEAN 0.003292 **
## households:median_income 4.85e-08 ***
## median_income:ocean_proximityINLAND < 2e-16 ***
## median_income:ocean_proximityNEAR BAY 0.025656 *
## median_income:ocean_proximityNEAR OCEAN 3.19e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 63190 on 20584 degrees of freedom
## Multiple R-squared: 0.7008, Adjusted R-squared: 0.7001
## F-statistic: 964.2 on 50 and 20584 DF, p-value: < 2.2e-16
#lets use Weighted Least Squares on this model to cover potential heteroskedasticity.
back_bic_mod_fitted = fitted(back_bic_mod)
back_bic_mod_resid = resid(back_bic_mod)
temp_wls_mod = lm(log(back_bic_mod_resid^2) ~ back_bic_mod_fitted + back_bic_mod_fitted^2)
ghat = fitted(temp_wls_mod)
hhat = exp(ghat)
WLS_back_bic_mod = update(back_bic_mod, weights = 1 / hhat)# to do: #figure out how to get call
plot(
back_bic_mod$fitted.values,
back_bic_mod$residuals
)
plot(
WLS_back_bic_mod$fitted.values,
WLS_back_bic_mod$residuals
)